Statistical learning methods as a preprocessing step for survival analysis: evaluation of concept using lung cancer data

نویسندگان

  • Madhusmita Behera
  • Erin E Fowler
  • Taofeek K Owonikoko
  • Walker H Land
  • William Mayfield
  • Zhengjia Chen
  • Fadlo R Khuri
  • Suresh S Ramalingam
  • John J Heine
چکیده

BACKGROUND Statistical learning (SL) techniques can address non-linear relationships and small datasets but do not provide an output that has an epidemiologic interpretation. METHODS A small set of clinical variables (CVs) for stage-1 non-small cell lung cancer patients was used to evaluate an approach for using SL methods as a preprocessing step for survival analysis. A stochastic method of training a probabilistic neural network (PNN) was used with differential evolution (DE) optimization. Survival scores were derived stochastically by combining CVs with the PNN. Patients (n = 151) were dichotomized into favorable (n = 92) and unfavorable (n = 59) survival outcome groups. These PNN derived scores were used with logistic regression (LR) modeling to predict favorable survival outcome and were integrated into the survival analysis (i.e. Kaplan-Meier analysis and Cox regression). The hybrid modeling was compared with the respective modeling using raw CVs. The area under the receiver operating characteristic curve (Az) was used to compare model predictive capability. Odds ratios (ORs) and hazard ratios (HRs) were used to compare disease associations with 95% confidence intervals (CIs). RESULTS The LR model with the best predictive capability gave Az = 0.703. While controlling for gender and tumor grade, the OR = 0.63 (CI: 0.43, 0.91) per standard deviation (SD) increase in age indicates increasing age confers unfavorable outcome. The hybrid LR model gave Az = 0.778 by combining age and tumor grade with the PNN and controlling for gender. The PNN score and age translate inversely with respect to risk. The OR = 0.27 (CI: 0.14, 0.53) per SD increase in PNN score indicates those patients with decreased score confer unfavorable outcome. The tumor grade adjusted hazard for patients above the median age compared with those below the median was HR = 1.78 (CI: 1.06, 3.02), whereas the hazard for those patients below the median PNN score compared to those above the median was HR = 4.0 (CI: 2.13, 7.14). CONCLUSION We have provided preliminary evidence showing that the SL preprocessing may provide benefits in comparison with accepted approaches. The work will require further evaluation with varying datasets to confirm these findings.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of Breast Tumor Malignancy Using Neural Network and Whale Optimization Algorithms (WOA)

Introduction: Breast cancer is the most prevalent cause of cancer mortality among women. Early diagnosis of breast cancer gives patients greater survival time. The present study aims to provide an algorithm for more accurate prediction and more effective decision-making in the treatment of patients with breast cancer. Methods: The present study was applied, descriptive-analytical, based on the ...

متن کامل

Survival and Factors Affecting it in Lung Cancer Patients Referred to Imam Khomeini Clinic in Hamadan Province

Background: Lung cancer is one of the most common cancers and the leading cause of death due to cancer in the world. It has the highest mortality rate compared to breast, prostate, and other cancers. Different factors can be effective in the survival of lung cancer patients. The present study has evaluated survival and its related factors. Materials and Methods: The present study was performed...

متن کامل

Behavioral Analysis of Traffic Flow for an Effective Network Traffic Identification

Fast and accurate network traffic identification is becoming essential for network management, high quality of service control and early detection of network traffic abnormalities. Techniques based on statistical features of packet flows have recently become popular for network classification due to the limitations of traditional port and payload based methods. In this paper, we propose a metho...

متن کامل

The Effect of Time-dependent Prognostic Factors on Survival of Non-Small Cell Lung Cancer using Bayesian Extended Cox Model

  Abstract Background: Lung cancer is one of the most common cancers around the world. The aim of this study was to use Extended Cox Model (ECM) with Bayesian approach to survey the behavior of potential time-varying prognostic factors of Non-small cell lung cancer. Materials and Methods: Survival status of all 190 patients diagnosed with Non-Small Cell lung cancer referring to hospitals in ...

متن کامل

The prediction of lymphedema via the combination of the selected data mining algorithms

Background: Breast cancer is the second leading cause of cancer death in women, after lung cancer. Due to the importance of predicting this disease, the use of data mining methods in medical research is more significant than before. Data mining algorithms can be a great help in preventing the development of lymphedema in patients. The aim Of this study was to create a diagnosis system that can ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2011